Augmenting the power of LSI in text retrieval: Singular value rescaling
نویسندگان
چکیده
This paper presents an analysis of several different LSI (latent semantic indexing) query approaches and proposes a novel rescaling technique, namely singular value rescaling (SVR). Experiments on a standardized TREC data set confirmed the effectiveness of SVR, showing an improvement ratio of 5.9% over the best conventional LSI query approach. In addition, we also compared SVR with another scaling technique in text retrieval called iterative residual rescaling (IRR). Experiments on TREC data set show that SVR performs better than IRR. 2007 Elsevier B.V. All rights reserved.
منابع مشابه
LRLW-LSI: An Improved Latent Semantic Indexing (LSI) Text Classifier
The task of Text Classification (TC) is to automatically assign natural language texts with thematic categories from a predefined category set. And Latent Semantic Indexing (LSI) is a well known technique in Information Retrieval, especially in dealing with polysemy (one word can have different meanings) and synonymy (different words are used to describe the same concept), but it is not an opti...
متن کاملAn Application of LSI and M-tree in Image Retrieval
When dealing with image databases, we often need to solve the problem of how to retrieve a desired set of images effectively and efficiently. As a representation of images, there are commonly used some high-dimensional vectors of extracted features, since in such a way the content-based image retrieval is turned into a geometric-search problem. In this article we present a case study of feature...
متن کاملFramework for Document Retrieval using Latent Semantic Indexing
Today, with the rapid development of the Internet, textual information is growing rapidly. So document retrieval which aims to find and organize relevant information in text collections is needed. With the availability of large scale inexpensive storage the amount of information stored by organizations will increase. Searching for information and deriving useful facts will become more cumbersom...
متن کاملLower Dimensional Representation of Text Data in Vector
Dimension reduction in today's vector space based information retrieval system is essential for improving computational eeciency in handling massive data. In this paper, we propose a mathematical framework for lower dimensional representation of text data in vector space based information retrieval using minimization and matrix rank reduction formula. We illustrate how the commonly used Latent ...
متن کاملClustered SVD strategies in latent semantic indexing q
The text retrieval method using latent semantic indexing (LSI) technique with truncated singular value decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term–document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collect...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Data Knowl. Eng.
دوره 65 شماره
صفحات -
تاریخ انتشار 2008